10 research outputs found
Computational Methods and Graphical Processing Units for Real-time Control of Tomographic Adaptive Optics on Extremely Large Telescopes.
Ground based optical telescopes suffer from limited imaging resolution as a result of the effects of atmospheric turbulence on the incoming light. Adaptive optics technology has so far been very successful in correcting these effects, providing nearly diffraction limited images. Extremely Large Telescopes will require more complex Adaptive Optics configurations that introduce the need for new mathematical models and optimal solvers. In addition, the amount of data to be processed in real time is also greatly increased, making the use of conventional computational methods and hardware inefficient, which motivates the study of advanced computational algorithms, and implementations on parallel processors. Graphical Processing Units (GPUs) are massively parallel processors that have so far demonstrated a very high increase in speed compared to CPUs and other devices, and they have a high potential to meet the real-time restrictions of adaptive optics systems. This thesis focuses on the study and evaluation of existing proposed computational algorithms with respect to computational performance, and their implementation on GPUs. Two basic methods, one direct and one iterative are implemented and tested and the results presented provide an evaluation of the basic concept upon which other algorithms are based, and demonstrate the benefits of using GPUs for adaptive optics
Improved Acceleration of the GPU Fourier Domain Acceleration Search Algorithm
We present an improvement of our implementation of the Correlation Technique
for the Fourier Domain Acceleration Search (FDAS) algorithm on Graphics
Processor Units (GPUs) (Dimoudi & Armour 2015; Dimoudi et al. 2017). Our new
improved convolution code which uses our custom GPU FFT code is between 2.5 and
3.9 times faster the than our cuFFT-based implementation (on an NVIDIA P100)
and allows for a wider range of filter sizes then our previous version. By
using this new version of our convolution code in FDAS we have achieved 44%
performance increase over our previous best implementation. It is also
approximately 8 times faster than the existing PRESTO GPU implementation of
FDAS (Luo 2013). This work is part of the AstroAccelerate project (Armour et
al. 2002), a many-core accelerated time-domain signal processing library for
radio astronomy.Comment: proceeding from ADASS XXVII conference, 4 page
GPU Fast Convolution via the Overlap-and-Save Method in Shared Memory
We present an implementation of the overlap-and-save method, a method for the convolution of very long signals with short response functions, which is tailored to GPUs. We have implemented several FFT algorithms (using the CUDA programming language), which exploit GPU shared memory, allowing for GPU accelerated convolution. We compare our implementation with an implementation of the overlap-and-save algorithm utilizing the NVIDIA FFT library (cuFFT). We demonstrate that by using a shared-memory-based FFT, we can achieved significant speed-ups for certain problem sizes and lower the memory requirements of the overlap-and-save method on GPUs
A GPU implementation of the Correlation Technique for Real-time Fourier Domain Pulsar Acceleration Searches
The study of binary pulsars enables tests of general relativity. Orbital
motion in binary systems causes the apparent pulsar spin frequency to drift,
reducing the sensitivity of periodicity searches. Acceleration searches are
methods that account for the effect of orbital acceleration. Existing methods
are currently computationally expensive, and the vast amount of data that will
be produced by next generation instruments such as the Square Kilometre Array
(SKA) necessitates real-time acceleration searches, which in turn requires the
use of High Performance Computing (HPC) platforms. We present our
implementation of the Correlation Technique for the Fourier Domain Acceleration
Search (FDAS) algorithm on Graphics Processor Units (GPUs). The correlation
technique is applied as a convolution with multiple Finite Impulse Response
filters in the Fourier domain. Two approaches are compared: the first uses the
NVIDIA cuFFT library for applying Fast Fourier Transforms (FFTs) on the GPU,
and the second contains a custom FFT implementation in GPU shared memory. We
find that the FFT shared memory implementation performs between 1.5 and 3.2
times faster than our cuFFT-based application for smaller but sufficient filter
sizes. It is also 4 to 6 times faster than the existing GPU and OpenMP
implementations of FDAS. This work is part of the AstroAccelerate project, a
many-core accelerated time-domain signal processing library for radio
astronomy.Comment: 20 pages, 9 figures. Accepted for publication in ApJ
Bits Missing: Finding Exotic Pulsars Using bfloat16 on NVIDIA GPUs
The Fourier domain acceleration search (FDAS) is an effective technique for detecting faint binary pulsars in large radio astronomy data sets. This paper quantifies the sensitivity impact of reducing numerical precision in the graphics processing unit (GPU)-accelerated FDAS pipeline of the AstroAccelerate (AA) software package. The prior implementation used IEEE-754 single-precision in the entire binary pulsar detection pipeline, spending a large fraction of the runtime computing GPU-accelerated fast Fourier transforms. AA has been modified to use bfloat16 (and IEEE-754 double-precision to provide a âgold standardâ comparison) within the Fourier domain convolution section of the FDAS routine. Approximately 20,000 synthetic pulsar filterbank files representing binary pulsars were generated using SIGPROC with a range of physical parameters. They have been processed using bfloat16, single-precision, and double-precision convolutions. All bfloat16 peaks are within 3% of the predicted signal-to-noise ratio of their corresponding single-precision peaks. Of 14,971 âbrightâ single-precision fundamental peaks above a power of 44.982 (our experimentally measured highest noise value), 14,602 (97.53%) have a peak in the same acceleration and frequency bin in the bfloat16 output plane, while in the remaining 369 the nearest peak is located in the adjacent acceleration bin. There is no bin drift measured between the single- and double-precision results. The bfloat16 version of FDAS achieves a speedup of approximately 1.6Ă compared to single-precision. A comparison between AA and the PRESTO software package is presented using observations collected with the GMRT of PSR J1544+4937, a 2.16 ms black widow pulsar in a 2.8 hr compact orbit